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Abstract 

xK 

All  failure  detection  methods  are  based,  either  explicitly  or 
implicitly,  on  the  use  of  redundancy,  i.e.  on  (possibly  dynamic) 
relations  among  the  measured  variables.  The  robustness  of  the  failure 
detection  process  consequently  depends  to  a  great  degree  on  the 
reliability  of  the  redundancy  relations,  which  in  turn  is  affected  by 
the  inevitable  presence  of  model  uncertainties.  In  this  paper  we 
address  the  problem  of  determining  redundancy  relations  that  are 
optimally  robust,  in  a  sense  that  includes  several  major  issues  of 
importance  in  practical  failure  detection,  and  that  provides  a 
significant  amount  of  intuition  concerning  the  geometry  of  robust 
failure  detection.  We  also  give  a  procedure,  involving  the  construction 
of  a  single  matrix  and  its  singular  value  decomposition,  for  the 
determination  of  a  complete  sequence  of  redundancy  relations,  ordered  in 
terms  of  their  level  of  robustness.  This  procedure  also  provides  the 
basis  for  comparing  levels  of  robustness  in  redundancy  provided  by 
different  sets  of  sensors.  ^ — ■ 
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1.  Introduction 


A  wide  variety  of  techniques  has  been  proposed  in  recent  years  for 
the  detection,  isolation,  and  accomodation  of  failures  in  dynamic 
systems  (see,  for  example,  the  surveys  in  [1,4]).  In  one  way  or  another, 
all  of  these  methods  involve  the  generation  of  signals  that  are 
.accentuated  by  the  presence  of  particular  failures  if  these  failures 
have  actually  occurred.  The  procedures  for  generating  these  signals  in 
turn  depend  on  models  relating  the  measured  variables.  Consequently,  if 
any  errors  in  these  models  have  effects  on  the  observables  that  are  at 
all  like  the  effects  of  any  of  the  failure  modes,  then  these  model 
errors  may  also  accentuate  the  signals.  This  leads  us  directly  to  the 
issue  of  robust  failure  detection,  that  is,  the  design  of  a  system  that 
is  maximally  sensitive  to  the  effects  of  failures  and  minimally 
sensitive  to  model  errors. 

The  work  described  here  focuses  on  directly  designing  a  failure 
detection  system  that  is  insensitive  to  model  errors  (rather  than 
designing  a  system  that  attempts  to  compensate  the  detection  algorithm 
by  estimating  uncertainties  on-line,  see  [6,  7,  12]).  The  initial 
impetus  for  our  approach  came  from  the  work  reported  in  [5,  13],  in  the 
context  of  aircraft  failure  detection.  The  noteworthy  feature  of  that 
project  was  that  the  dynamics  of  the  aircraft  were  decomposed  in  order 
to  analyze  the  relative  reliability  of  each  individual  source  of 
potentially  useful  failure  detection  information.  In  this  way,  a  design 
was  developed  that  utilized  only  the  most  reliable  information. 

In  [2]  we  presented  the  results  of  our  initial  attempt  to  extract 
the  essence  of  the  method  used  in  [9,  13]  in  order  to  develop  a  general 
approach  to  robust  failure  detection.  As  discussed  in  those  references 
and  in  others  (such  as  [3,  7,  8]),  all  failure  detection  systems  are 
based  on  exploiting  analytical  redundancy  relations  or  (generalized) 
parity  checks.  These  are  simply  functions  of  the  temporal  histories  of 
the  measured  quantities  that  have  the  property  of  being  small  (ideally 
zero)  when  the  system  is  operating  normally.  Essentially  all  of  the 
recently  developed  general  approaches  to  failure  detection  make 
implicit,  rather  than  explicit  use  of  all  of  these  relations.  That  is, 
these  general  methods  use  an  overall  dynamic  model  as  the  basis  for 
designing  failure  detection  algorithms.  While  such  a  model  certainly 
captures  all  of  the  relationships  among  the  measured  variables,  it  does 
not  in  any  way  discriminate  among  these  individual  relationships.  For 
this  reason,  a  top-down  application  of  any  of  these  methods  mixes 
together  information  of  varying  levels  of  reliability.  What  would 
clearly  be  preferable  would  be  a  general  method  for  explicitly 
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identifying  and  utilizing  only  the  most  reliable  of  the  redundancy 
relations. 


One  criterion  for  measuring  the  reliability  of  a  particular 
redundancy  relation  was  presented  in  [2]  and  was  used  to  pose  an 
optimization  problem  to  determine  the  most  reliable  relation.  This 
criterion  has  the  feature  that  it  specifies  robustness  with  respect  to  a 
particular  operating  point,  thereby  allowing  the  possibility  of 
adaptively  choosing  the  best  relations.  However,  a  drawback  of  this 
approach  is  that  it  leads  to  an  extremely  complex  optimization  problem. 
Moreover,  if  one  is  interested  in  obtaining  a  list  of  redundancy 
relations  that  is  ordered  from  most  to  least  reliable,  one  must 
essentially  solve  a  separate  optimization  problem  for  each  relation  in 
the  list. 


In  this  paper  we  look  at  an  alternative  measure  of  reliability  for 
a  redundancy  relation.  Not  only  does  this  alternative  have  a  helpful 
geometric  interpretation,  but  it  also  leads  to  a  far  simpler 
optimization  procedure,  involving  a  single  singular  value  decomposition. 
In  addition,  it  allows  us  in  a  natural  and  computationally  feasible  way 
to  consider  issues  such  as  scaling,  relative  merits  of  alternative 
sensor  sets,  and  explicit  tradeoffs  between  detectability  and 
robustness. 


In  Section  2  we  review  the  notion  of  analytical  redundancy  for 
perfectly  known  models,  and  then  provide  a  geometric  interpretation  that 
forms  the  starting  point  for  our  investigation  of  robust  failure 
detection.  Section  3  addresses  the  problem  of  robustness  using  our 
geometric  ideas,  and  solves  a  version  of  the  optimally  robust  redundancy 
problem.  In  Section  4  we  discuss  extensions  to  include  three  important 
issues  not  included  in  Section  3:  noise,  known  inputs,  and  the 
detection/robustness  tradeoff.  We  conclude  the  paper  in  Section  5  with 
a  discussion  of  several  other  topics,  including  the  relationship  of  our 
results  to  those  in  [2]  and  the  use  of  this  formalism  to  measure  and 
compare  the  levels  of  robust  redundancy  associated  with  different  system 

configurations.  _ _ 
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2.  lednaducy  Itlatioaa 


This  paper  focuses  attention  on  linear,  time-invariant,  discrete- 
time  systems.  In  this  section  ve  consider  the  uncertainty-free  model 

x(k+l)  -  Ax(k)  +  Bu(k)  ,  (1) 

y(k)  -  Cx(k)  +  Du(k)  ,  (2) 


where  x  is  an  n-dimensional  state  vector,  u  is  an  m-dimensional  vector 
of  known  inputs,  y  is  an  r-diaensional  vector  of  measured  outputs,  and 
A,  B,  C  and  D  are  known  matrices  of  appropriate  dimensions.  A 
redundancy  relation  for  this  model  is  some  linear  combination  of  present 
and  lagged  values  of  u  and  y  that  is  identically  zero  if  no  changes 
(i.e.  failures)  occur  in  (1),  (2). 


As  discussed  in  [2],  redundancy  relations  can  be  specified 
mathematically  in  the  following  way.  The  subspace  of  (s+l)r- 
dimensional  vectors  given  by 
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is  cal  led  the  parity  space  of  order  s  (to  be  distinguished  from  the  s- 
step  unobservable  subspace,  which  corresponds  to  the  right  null  space  of 
the  matrix  in  (3)  rather  than  its  left  null  space).  We  shall  denote 
(s+l)r  by  N.  Every  vector  v  in  (3)  can  be  associated  at  any  time  k  with 
a  parity  check.  r(k): 
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(The  development  in  Sections  2  to  4  deals  with  a  single,  fixed  value  of 
s.  Therefore,  to  avoid  notational  clutter,  we  shall  not  index  subspaces 
such  as  P  in  (3)  or  matrices  such  as  E  in  (4)  with  the  subscript  s. 
Consideration  of  different  values  of  s  is  contained  in  Section  S.)  By 
(1),  (2),  the  quantity  in  brackets  [.]  in  (4)  equals 

x(k-s)  .  (6) 


Hence,  by  (3),  we  see  that  the  simple  redundancy  relation  or  parity 
check 


r(k)  -  0  (7) 
is  satisfied. 

It  is  evident  from  (4)  and  (7)  that  a  redundancy  relation  is  simply 
an  input-output  model  for  (or  constraint  on)  part  of  the  dynamics  of  the 
system  (1),  (2).  This  interpretation  of  a  redundancy  relation  allows  us 
to  make  contact  with  the  numerous  existing  failure  detection  methods. 
These  methods  are  typically  based  on  a  noisy  version  of  the  model  (1), 
(2)  that  represents  normal  system  behavior,  together  with  a  set  of 
deviations  from  this  model  that  represent  the  several  failure  modes. 
However,  rather  than  applying  such  methods  to  a  single,  all-encompassing 
model  as  in  (1),  (2),  one  could  alternatively  apply  the  same  techniques 
to  individual  models  as  in  (4),  (7),  or  to  a  combination  of  several  of 
these,  which  serves  to  isolate  individual  (or  specific  groups  of)  parity 
checks.  (See  Section  5  for  some  further  comments  on  this  point.)  This  is 
precisely  what  was  done  in  [3,  13],  for  example.  The  advantage  of  such 
an  approach  is  that  it  allows  one  to  separate  the  information  provided 
by  redundancy  relations  of  differing  levels  of  reliability,  something 
that  is  not  easily  done  when  one  starts  with  the  overall  model  (1),  (2), 
which  combines  all  redundancy  relations. 

In  the  next  two  sections  we  address  the  main  problem  of  this  paper, 
which  is  the  determination  of  optimally  robust  redundancy  relations. 
The  key  to  this  approach  is  obtained  by  re-examining  (3)-(7),  in  order 
to  suggest  a  geometrical  interpretation  of  parity  relations.  In 
particular,  consider  the  model  (1),  (2)  and  let  Z  denote  the  range  of 
the  matrix  in  (3).  Then  the  parity  space  P  is  the  orthogonal  complement 
of  Z,  and  a  complete  set  of  parity  checks,  of  order  s  and  of  the  form 
(4),  (7),  is  given  by  the  orthogonal  projection  of  the  vector  of  input- 
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(8) 


onto  P. 

To  illustrate  this,  consider  an  example  in  which  the  first  two 
components  of  y  measure  scaled  versions  of  the  same  variable,  i.e. 

72(h)  *  ayj(k)  .  (9) 

Then,  as  illustrated  in  Figure  1,  the  subspace  Z  in  y^  -  y2  space  is 
simply  the  line  specified  by  (9).  Furthermore,  in  this  case  the  obvious 
parity  relation  is 

r(k)  -  y2(k)  -  ayx(k)  ,  (10) 

which  is  nothing  more  than  the  orthogonal  projection  of  the  observed 
pair  of  values  yj(k)  and  y2(k)  onto  the  line  P  perpendicular  to  Z 
(Figure  1).  For  interpretations  of  the  space  P  in  purely  matrix  terms 
and  in  terms  of  polynomial  matrices,  we  refer  the  reader  to  [9]  and  [3], 
respectively.  It  is  the  geometric  interpretation,  however,  that  we 
shall  utilize  here. 


r  =  y2  -  oy, 


An  Example  of  the  Geometric  Interpretation  of  Parity 
Relations. 


3.  A  Geometric  Approach  to  Kobost  Redundancy 

To  begin,  let  us  focus  on  a  model  that  is  not  driven  by  either 
unknown  noise  or  known  signals: 


x(k+l)  •  Aqx(k) 

(11) 

y(k)  •  Cqx(k) 

(12) 

where  q  indexes  the  models  associated  with  different  possible  values  of 
the  unknown  parameters.  Throughout  this  paper  (except  for  a  brief 
discussion  in  Section  5),  we  consider  only  the  case  where  q  is  taken 
from  a  finite  set  of  possibilities,  say  q*l,  2,..,Q.  In  practice,  this 
might  involve  choosing  representative  points  out  of  the  actual, 
continuous  range  of  parameter  values,  reflecting  any  desired  weighting 
on  the  likelihood  or  importance  of  particular  sets  of  parameter  values. 

Define  the  (s-step)  observation  space  Z by 

Zq  -  range 


This  is  the  subspace  in  which  the  window  of  observations  for  the  system 
(11),  (12)  lives,  as  x(k-s)  varies  over  all  possible  values.  For  a  given 
q,  the  parity  space  is  the  orthogonal  complement,  Pq,  of  Zq.  However, 
the  orthogonal  complement  of  one  observation  space  will  not  be  the 
orthogonal  complement  of  another  distinct  observation  space.  It  is 
therefore  in  general  impossible  to  find  parity  checks  that  are  perfect 
for  all  possible  values  of  q.  That  is,  in  general  we  cannot  find  a 
subspace  P  that  is  orthogonal  to  Zq  for  all  q. 

What  would  seem  to  make  sense  in  this  case  is  to  choose  a  subspace 
P  that  is  "as-  orthogonal  as  possible"  to  all  possible  Zq.  Returning  to 
our  simple  example,  suppose  that  y^  *  ayj  but  that  'a'  is  only  known  to 
lie  in  some  interval.  In  this  case  we  obtain  the  picture  shown  in  Figure 
2.  The  shaded  regions  here  represents  the  range  of  (y^,  y-j)  values 
consistent  with  the  uncertainty  in  'a'.  Intuitively,  what  would  seem  to 
be  a  good  choice  for  P  (assuming  that  'a'  is  equally  likely  to  lie 
anywhere  in  the  interval  (24))  is  the  line  that  bisects  the  obtuse  angle 
between  the  shaded  sectors  in  Figure  2.  It  is  precisely  this  geometric 
picture  that  is  generalized  and  built  upon  in  this  paper. 


°q 

Vq 

CA> 

q  q 


(13) 
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For  the  general  case,  our  procedure  will  be  to  first  compute  an 
average  observation  space  Zq  that  is  as  close  as  possible,  in  a  sense  to 
be  made  precise,  to  all  of  the  Zq.  He  shall  then  choose  P  to  be  the 
orthogonal  complement  of  Zq.  (this  idea  is  also  illustrated  in  Figure  2, 
where  the  average  observation  space  Zq  is  depicted  as  the  line  that 
bisects  the  shaded  region,  and  the  line  P  then  represents  its  orthogonal 
complement.)  Note  that  the  Zq  are  subspaces  of  possibly  differing 
dimensions,  embedded  in  a  space  of  dimension  N  ■  (s+l)r,  corresponding 
to  histories  of  the  last  s+1  values  of  the  r-dimensiona  1  output. 
Consequently,  if  we  would  like  to  determine  the  p  best  parity  checks  (so 
that  dim  P  “  p),  we  need  to  find  a  subspace  Zq  of  dimension  N-p. 

A  Preliminary  Scaling:  Before  stating  the  criterion  that  defines  Zq,  it 
is  necessary  to  take  account  of  a  fact  that  has  been  glossed  over  so 
far.  It  is  not  sufficient  to  simply  examine  the  subspaces  in  which 
signals  lie;  one  has  also  to  consider  the  characteristic  magnitudes  and 
directions  of  the  excursions  of  signals  in  the  subspaces  to  which  they 
are  confined.  It  will  typically  be  the  case  that  some  components  (or 
combinations  of  components)  of  x(k-s)  are  larger  than  others,  because 
they  may  be  measured  in  different  units  and  excited  differently.  Hence 
certain  excursions  in  observation  space  are  more  likely  than  others.  To 
take  account  of  this,  assume  for  now  that  we  are  able  to  find  a 
nonsingular  scaling  matrix  Mq  such  that,  with  the  change  of  basis 

x  -  Mqw  ,  (14) 

one  obtains  a  variable  w  that  is  governed  by  a  similarity-transformed 
version  of  (11),  (12)  and  has  "equally  likely"  excursions  of  "unit 
length"  in  each  direction  under  the  q-th  model.  This  sort  of 
normalization  is  discussed  more  at  the  end  of  this  section  and  in 
Section  4.1,  where  observation  and  process  noise  are  incorporated  into 
the  model.  (See  also  [11],  in  which  scaling  is  also  considered  in  the 
context  of  the  design  of  a  failure  detection  system.)  We  can  now  use  the 
columns  of  the  matrix 


as  a  spanning  set  for  Zq.  We  shall  denote  the  matrix  in  (15)  by  the  non¬ 
boldface  Zq.  We  shall,  in  the  remainder  of  this  paper,  consistently  use 
a  boldface  capital  letter  to  denote  the  subspace  spanned  by  the  columns 
of  a  matrix  that  is  denoted  by  the  corresponding  non-boldface  capital. 


The  criterion  for  the  best  choice  of  Zq  may  now  be  defined  in  the 
following  manner.  With  Zj,  ...  ,  Zq  denoting  the  scaled  matrices  in  (15) 
whose  columns  span  the  possible  subspaces  in  which  the  observation 
histories  may  lie  under  normal  conditions,  define  the  NxQn  matrix 


Z  «  [Zj:  ...  :Zq]  (16) 

The  optimum  choice  for  Zq  is  then  taken  to  be  the  span  of  the  columns  of 
the  matrix  Zq  that  minimizes 

II  Z  -  Zq  tip  ,  (17) 

subject  to  the  constraint  that  rank  Zq  “  N-p  (which  ensures  that  the 
orthogonal  complement  P  of  Zq  has  dimension  p).  Here  i|  *  lip  denotes  the 
Frobenius  norm,  which  is  defined  as  the  sum  of  the  squares  of  the 
entries  of  the  associated  matrix.  The  matrix  Zq  is  thus  chosen  so  that 
the  sum  of  the  squared  distances  between  the  columns  of  Z  and  of  Zq  is 
minimized,  subject  to  the  constraint  that  Zq  contains  only  N-p  linearly 
independent  columns. 

The  optimization  problem  we  have  just  posed  is  easy  to  solve.  In 
particular  let  the  singular  value  decomposition  (see  [14,  15])  of  Z  be 
given  by 


Z  *  0  £  V 


where 


I- 


°1 

°2 

0 


(18) 


(19) 


and  D  and  V  are  orthogonal  matrices.  Here  cr^  £  o^  —  —  °N  are  t*ie 
singular  values  of  Z,  ordered  by  magnitude.  Note  that  we  have  actually 
assumed  N  £  Qn  .  If  this  is  not  the  case,  we  can  make  it  so  without 
changing  the  optimum  choice  of  Zq  by  padding  Z  with  additional  columns 
of  zeros.  As  shown  in  [17]  (see  also  [18]),  the  matrix  Zq  minimizing 
(17)  is  given  by 
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Moreover,  since  the  columns  of  U  are  orthonormal,  we  immediately  see 
that  the  orthogonal  complement  of  the  range  Zq  of  Zq  is  given  by  the 
first  p  left  singular  vectors  of  Zq,  i.e.  the  first  p  columns  of  U. 
Consequently,  an  orthonormal  basis  for  the  parity  space  P  is  given  by 

P  *  [uj . Up]  (21) 

and  U},...,Up  define  optimum  redundancy  relations  or  parity  checks.* 

There  are  additional  reasons  for  choosing  this  method  for 
determining  Zq  and  P,  apart  from  the  fact  that  the  computation  just 
described  is  quite  straightforward.  Firstly,  minimization  of  the 
criterion  in  (17)  does  produce  a  space  that  is  as  close  as  possible  in  a 
natural  sense  to  a  specified  set  of  directions,  namely  the  columns  of 
{Zq,  q  *  1,...,Q>  .  Thanks  to  the  scaling  (14),  these  columns  represent 
a  complete  set  of  "equally  likely"  directions  in  the  observation  space 
Zq  (corresponding  to  the  "equally  likely"  values  of  the  scaled  state  w  * 
[1,0,...,0]T,  [0,1, ...,0  ]T,  etc.).  A  second  (and  more  precisely  stated) 
reason  follows  from  an  alternative  interpretation  of  our  choice  of  P 
that  provides  some  very  useful  insight. 

Specifically,  recall  that  what  we  wish  to  do  is  to  find  a  subspace 
P  that  is  as  orthogonal  as  possible  to  all  the  subspaces  Zq.  Translating 
this  to  statements  about  bases  for  these  spaces,  we  would  like  to  choose 
an  Nxp  matrix  P,  normalized  by  the  condition  that  it  have  orthonormal 
columns  (i.e.  PTP  *  I  ,  so  that  P  is  the  orthogonal  projection  onto  the 
subspace  P)  ,  to  make  each  of  the  matrices  P  Zq  as  close  to  zero  as 
possible.  Now,  as  shown  in  the  Appendix,  the  choice  of  P  given  in  (21) 
also  minimizes 

Q 

J  -  I  II  PTZq  |||  ,  (22) 

q-1 

yielding  the  minimum  value 

+Note  that  if  <*p+i  *  0,  then  (a)  Zq  actually  has  rank  less  than  N-p  and 
(b)  there  is  a  perfectly  robust  parity  space  of  dimension  at  least  p+1 . 


(,  the  same  choice  of  P  can  also 
aningful  criteria. 


so  It  (22),  (23)  should  be  noted, 
tforvard  way  in  which  to  inc lode 
is  in  (22).  Specifically,  if  a 


b  described  previously,  but  with 
e  step  further,  if  we  normalize 
hink  of  them  as  representing  the 
sible  system  models.  Thus  Jj  in 
value  of  II  PTZq  lip  ,  where  the 
ncertainty.  Furthermore,  if  we 
a  state  w  with  unit  covariance 
e  interpreted  as  E<J(  II  r(k)  ||*)  , 
used  to  denote  the  vector  whose 
ity  checks  determined  by  the 


(k-s),  assuming  that  the  data  is 
j  this  with  the  probabilistic 


and  the  model  uncertainty.  It  is 
the  next  section. 


ralue  (23)  provides  us  with  an 


interpretation  of  the  singular  vali 
provides  a  sequence  of  parity  rela 
robust:  u^  is  the  most  reliable  p 
robustness  measure;  U2  is  the  next 
robustness  measure;  etc.  Consequent 
decomposition,  we  can  obtain  a  £ 
redundancy  relation  problem  for  a  £ 
length  time  history  of  output  values. 


4.  Three  Extensions 

In  this  section  we  develop  three  extensions  of  the  result  of  the 
preceding  section,  through  modifications  that  entail  no  fundamental 
increase  in  complexity.  The  treatment  of  noise  is  first  addressed,  in 
Section  4.1,  while  the  inclusion  of  known  inputs  is  discussed  in  Section 
4.2.  Finally,  the  issue  of  designing  parity  checks  for  robust  detection 
of  a  particular  failure  mode  is  examined  in  Section  4.3. 


4.1  Observation  and  Process  Noise 

In  addition  to  choosing  parity  relations  that  are  maximally 
insensitive  to  model  uncertainties,  it  is  also  important  to  choose 
relations  that  suppress  noise.  Consider  the  model 

x(k+l)  *  AqX(k)  +  Bqu(k),  (27) 

y(k)  -  Cqx(k)  ♦  Dqu(k) ,  (28) 

where  u(.)  is  a  zero  mean,  unit  covariance,  white  noise  process.  We 
assume  that  x  and  y  have  attained  stationarity,  and  that  the  steady-* 
state  covariance  of  x  is  given  by 

Sq  -  MqMq  (29) 

The  time  window  of  observations  for  (27),  (28)  is  now  given  by 


'y(k-s)  ' 

m  m 

Cq 

'  u(k-s)  “ 

y(k-s+l) 

” 

cqAq 

Mqw(k-s)  +  Hq 

u(k-s+l) 

• 

y(k) 

• 

C  A  8 

L  q  q  - 

• 

_u(k)  . 

where  w(k-s)  has  zero  mean  and  unit  covariance  —  cf.  (14),  (15)  and  the 
discussion  at  the  end  of  Section  3  —  and  Hq  has  the  same  structure  as 
in  (8),  except  that  all  matrices  are  replaced  by  their  subscripted 
versions,  since  it  is  the  q-th  model  that  is  under  consideration.  We 
shall  write  (30)  more  compactly  as 

Y(k)  -  Zqw(k-s)  ♦  HqU(k)  ,  (31) 

with  the  definitions  of  the  symbols  being  obvious  from  (30).  In 
particular,  note  that  the  U(k)  has  unit  covariance  and  is  independent  of 
w(k-s). 
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A  natural  extension  of  the  minimization  criterion  (24),  (26)  is 
then  provided  by 

Q 

J-  I  aqEq(  II  r(k)  II  2)  (32) 

q-1 


where 


r(k)  -  PTY(k)  (33) 

and  where  Eq  denotes  the  expectation  over  w(k-s)  and  U(k),  assuming  that 
the  data  is  generated  by  the  q-th  model.  As  before,  J  is  to  be  minimized 
by  choice  of  P  that  satisfies  P*P  -  I  ,  and  the  parity  space  P  will 
then  be  taken  to  be  the  range  of  P. 

For  simplicity,  let  us  first  assume  that  aq  *  1  for  all  q.  It  is 
then  quite  directly  seen  that 

Q 

J  -  I  tr[PT(ZqZqT  ♦  HqHqT)P] 
q-1 

Q 

-  IllPT[Zq:Hq]  ||J  .  (34) 

q-1 

From  this  it  is  evident,  given  our  previous  results,  that  the  optimum 
choice  of  P  is  computed  by  performing  a  singular  value  decomposition  on 
the  matrix 

T  -  [ZjiHj:  ...  : Zq : Hq ]  .  (35) 

If  the  aq  are  not  all  identical,  then  we  simply  modify  T  by  scaling  Zq 
andHqbyVa^. 

It  is  evident  from  the  above  that  the  effect  of  noise  is  simply  to 
define  additional  directions  to  which  the  columns  of  P  should  be  as 
orthogonal  as  possible  .  That  is,  P  is  to  be  chosen  so  that  the  parity 
check  r(k)  has  minimal  response  both  to  the  likely  sequences  of  values 
of  the  ideal  noise-free  observations  (as  specified  by  the  columns  of  Zq) 
and  to  the  directions  in  which  the  observation  noise  and  process  noise 
have  their  maximum  effects  (as  determined  by  the  columns  of  Hq).  The 
solution  of  this  problem  yields,  as  before,  and  complete  set  of  parity 
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checks,  corresponding  to  the  left  singular  vectors  of  T,  ordered  in 
terns  of  their  degrees  of  insensitivity  to  model  errors  and  noise  (as 
measured  by  the  corresponding  singular  values). 


4.2  Known  Inputs 


The  analysis  of  the  preceding  section  can  be  modified  somewhat  to 
allow  us  to  consider  the  case  in  which  some  of  the  driving  terns  in  (27) 
are  known  inputs.  To  simplify  the  discussion  in  this  section,  we  assume 
that  all  of  the  components  of  u(k)  are  known  inputs.  The  extension  to 
the  case  when  there  are  both  known  inputs  and  noise  is  straightforward. 


The  key  difference  between  the  case  in  which  u(k)  is  unmeasured  and 
the  case  in  which  it  is  measured  is  that  in  the  latter  case  we  can 
adjust  the  measured  outputs  y(k)  to  account  for  the  effect  of  the 
measured  inputs  u(k)  (see  the  discussion  in  Section  2).  That  is,  we  can 
consider  defining  a  vector  of  parity  checks  of  the  form 


r(k) 


(36) 


T 

where  P  P  “  Ip.  The  question  then  is,  how  do  we  measure  the  robustness 
of  r(k).  Clearly,  since  U(k)  is  known,  we  can  consider  defining  a 
robustness  measure  relative  to  any  specified  input  sequence  U(k).  This 
approach  is  closer  to  the  spirit  of  the  work  of  Chow  and  Willsky  [2]. 
As  discussed  in  Section  5,  such  an  approach  allows  one  to  adjust  the 
parity  matrix  P  on-line  by  (in  effect)  scheduling  it  with  respect  to 
U(k),  but  the  price  that  is  paid  for  this  is  significantly  greater  on¬ 
line  and  off-line  computational  complexity. 


What  we  nTHn  1  ’*tt~ i nnf ssd  is  to  follow  the  same  philosophy  we  have 
used  upto  this  point.  That  is,  we  shall  attempt  to  find' a  single  matrix 
P  that  minimizes  the  norm  of  r(k)  on  the  average,  as  w(k-s)  and  U(k) 
vary  over  their  likely  range  of  values.  More  precisely,  we  assume  that 
U(k)  is  zero  mean,  and 


w(k-s) 

0(k) 


[wT(k-s),  0T(k)J  ■  NqnJ 


(37) 


where  Nq  is  any  square  root  of  the  covariance  matrix  above.  As  an 
example,  if  a  feedback  control  of  the  form  u(k)  ■  Gv(k)  is  used,  then 


U(k)  -  L_w(k-s ) 


(38) 


for  a  matrix  Lq  that  is  easily  written  in  terms  of  G,  Aq,  Bq  and  Mq  (but 
we  omit  the  explicit  details  here),  so  that 

N*-ttL*]  .  (39) 

If  process  noise  were  also  included,  there  would  not  be  a  deterministic 
coupling  of  U(k)  and  w(k-s),  and  a  straightforward  modification  of  (38) 
would  provide  the  appropriate  form  for  Nq. 

Consider  now  the  criterion  (32),  with  all  of  the  aq  taken  to  be  1 
for  the  sake  of  simplicity.  A  direct  calculation  yields 

Q 

J  -  I  II  PTRq  |||  ,  (40) 

q-1 

where 

Nq  ,  (41) 

so  that  the  optimum  choice  of  P  is  obtained  from  the  singular  value 
decomposition  of  [R^:R2:  ...  :RqI. 

4.3  Detection  Versus  Robustness 

The  methods  described  to  this  point  involve  measuring  the  quality 
of  redundancy  relations  in  terms  of  how  small  the  resulting  parity 
checks  are  under  normal  operating  conditions.  That  is,  good  parity 
checks  are  maximally  insensitive  to  modeling  errors  and  noise.  However, 
in  some  cases  one  might  prefer  to  broaden  the  viewpoint.  In  particular, 
there  may  be  parity  checks  that  are  not  optimally  robust  (in  the  sense 
that  we  have  discussed)  but  that  are  still  of  significant  value  because 
they  are  extremely  sensitive  to  particular  failure  modes.  In  this 
subsection,  we  consider  a  criterion  that  takes  such  a  possiblity  into 
account.  We  focus,  for  simplicity,  on  the  noise-free  case.  The 
extension  to  include  noise  or  known  inputs  as  in  the  previous  subsection 
is  straightforward. 

The  specific  problem  to  be  considered  is  the  choice  of  parity 
checks  for  the  robust  detection  of  a  particular  failure  mode.  We  assume 
that  the  unfailed  model  of  the  system  is 
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(42) 


x(k+l)  *  AqX(k)  , 

y(k)  -  Cqx(k)  ,  (43) 

while  if  the  failure  has  occurred  the  model  is 

x(k+l)  ■  Xqx(k),  (44) 

y(k)  ■  Cqx(k)  .  (45) 

For  example,  if  we  return  to  the  simple  case  y2(k)  -  ay^(k),  then 
under  unfailed  conditions  one  might  have 

a^  <,  a  <,  a2  (46) 

while  after  a  failure 

<  a  <  ij  •  (47) 


This  is  illustrated  pictorial  ly  in  Figure  3.  In  this  case,  one  would 
like  to  choose  the  line  P  onto  which  one  projects  in  such  a  way  that  a 
small  projection  is  obtained  if  no  failure  has  occurred  and  a  large 
value  results  if  a  failure  occurs.  That  is,  we  would  like  P  to  be  "as 
orthogonal  as  possible"  to  Z  and  "as  parallel  as  possible"  to  Z. 

Returning  to  the  general  problem,  we  again  assume  that  q  takes  on 

one  of  Q  possible  values,  and  we  let  Zq  and  Zq  denote  the  counterparts 

of  Zq  in  (15)  for  the  unfailed  and  failed  models,  respectively.  Ve  now 

have  a  tradeoff:  we  would  like  to  make  P^Zq  as  small  as  possible  for  all 

q  and  to  make  P*Z  as  large  as  possible.  A  natural  criterion,  for 

4  j 

minimization  over  all  P  satisfying  PAP  •  I  ,  is  provided  by 

Q 

J  -  I( jlPTZq  Ilf  -  |lPTZq  ||f)  .  (48) 

q-1 

If  we  define  the  matrices 

H  •  (Z1:Z2:  ...  :Zq.'Zi:Z2:  ...  : Z’q ]  (49) 

and 


S  *  block  diagonal  [Iqq  ,  -v 


(50) 


Z.  =  {Z(a),a,sasa2} 


Z  =  {Z(a),alSasa2} 


Figure  3:  Illustrating  Robust  Detectability.  Here  Z  represents 
the  set  of  values  of  (y^,y2)  that  can  occur  under  normal  operation, 
while  Z  represents  the  corresponding  set  after  the  occurrence  of  a 
failure. 


JLrJ 


J  -  tr  tPTHSHTP] 


(51) 


It  is  straightforward  (see  [3])  to  show  that  a  minor  modification 
of  the  result  in  (17]  leads  to  the  following  solution.  We  perform  an 
eigenvector-eigenvalue  analysis  on  the  matrix 

HSHt  -  U  AUT  (52) 

where  U  is  orthogonal  and 

A  ■  diagonal  (Ai,...,At|l  ,  Aj  <  ...  <,  Ag  •  (53) 

Then  the  optimum  choice  for  P  is  the  first  p  columns  of  U: 

P  *  luj: ... :up]  .  (54) 

The  corresponding  minimum  value  of  J  in  (48),  (51)  is 
P 

J*  -  IAi  .  (55) 

i-1 

Two  comments  are  in  order  about  this  solution.  The  first  is  that 
no  more  than  Qn  of  the  Aq  can  be  positive.  In  fact  the  parity  check 
based  on  u^  is  likely  to  have  larger  values  under  failed  rather  than 
unfailed  conditions  if  and  only  if  Aq  <  0  .  Thus  we  immediately  see 
that  the  maximum  number  of  useful  parity  relations  for  detecting  this 
particular  failure  mode  equals  the  number  of  negative  eigenvalues  of 
HSHT. 

As  a  second  comment,  let  us  contrast  the  procedure  we  use  here  with 
the  singular  value  decomposition  of  Z  used  in  Section  3,  which 
corresponds  essentially  to  performing  an  eigenvector-eigenvalue  analysis 
of  ZZ*.  First,  assume  that  precisely  the  first  K  of  the  Aq  are 
negative,  and  define 


Z  “  diagonal  [oj . <tn]  . 


(57) 


From  (52)  we  have  that 

HSH  -  0ZSX&T  .  (58) 

Assuming  that  Z  is  nonsingular  (which  implies  K«Qn),  define 

V  -  I  -W*  .  (59) 


Then  V  is  S-orthogonal. 


VSVT  -  S  , 


(60) 


and  H  has  what  we  call  an  S~singular  value  decomposition 


H  -  OZV  . 


Thus,  instead  of  the  singular  value  decomposition  of  Z  that  we  used  in 
Section  3,  the  modified  problem  considered  in  this  subsection  calls  for 
the  S-singular  value  decomposition  of  H. 
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5.  Discussion 


This  paper  has  developed  methods  for  determining  robust  parity 
relations  for  failure  detection  in  dynamic  systems.  The  methods  build 
on  the  geometric  interpretation  of  parity  checks  as  orthogonal 
projections  of  windows  of  observations  onto  subspaces  that  are  as 
orthogonal  as  possible  to  the  observation  sequence,  given  the  presence 
of  model  uncertainties  and  noise,  We  also  considered  the  modification 
of  this  criterion  to  enable  choice  of  parity  checks  for  the  detection  of 
a  particular  failure  mode.  In  each  of  the  cases  considered,  a  single 
singular  value  decomposition  (or  a  variation  of  it,  in  the  case  of 
Section  4.3)  produced  a  complete  sequence  of  orthogonal  parity 
relations,  ordered  in  terms  of  a  meaningful  measure  of  robustness.  In 
this  section  we  provide  brief  discussions  of  several  issues  concerned 
with  the  interpretation  and  use  of  these  results. 


5.1  A  Graphical  Picture  of  Robust  Redundancy 

In  all  three  of  the  formulations  considered  (in  Sections  3,  4.1, 
and  4.2),  we  considered  the  problem  of  finding  the  p  best  parity  checks. 
An  obvious  question,  then,  is  what  is  a  good  value  of  p?  While  our 
results  do  not  give  a  precise  answer  to  this  question,  they  do  provide  a 
basis  for  obtaining  a  picture  of  the  level  of  robust  redundancy  in  a 
particular  system  configuration,  as  outlined  next. 

Recall  that  the  solutions  to  our  problems  provide  rank-ordered 
lists  of  parity  relations,  with  a  figure  of  merit  for  each  relation 
given  by  a  corresponding  singular  value  (or  eigenvalue  for  the  case  of 
Section  4.3).  For  example,  consider  the  criterion  (22).  As  we  have 
seen,  minimization  of  J  over  all  choices  of  the  parity  check  matrix  P 
subject  to  the  constraint  that  P^P  ■  Ip  (i.e.  that  we  specify  exactly  p 
parity  checks)  results  in  the  value  J  given  in  (23),  namely  the  sum  of 
the  p  smallest  singular  values  of  the  matrix  Z  in  (18).  The  solid  curve 
in  Figure  4  illustrates  a  plot  of  this  minimum  value  J*  as  a  function  of 
p.  Note  that  this  curve  must  be  convex,  since  the  increment  in  J*  when 
we  increase  the  number  of  parity  checks  from  p  to  p+1  is  which  is 

at  least  as  large  as  the  squares  of  any  of  the  p  previous  singular 
values.  Furthermore,  in  this  illustration  the  knee  in  the  solid  curve 
indicates  a  sharp  increase  in  the  singular  values,  which  in  turn  points 
to  a  value  of  p  beyond  which  the  level  of  robustness  decreases  markedly. 


Plots  as  in  Figure  4  can  also  be  of  value  in  comparing  different 
system  configurations.  In  particular,  in  specifying  a  sensor  complement 


for  a  particular  system,  one  is  certainly  interested  in  finding  a  set  of 
sensors  that  provides  a  sufficient  level  of  robust  redundancy  to  allov 
accurate  failure  detection  to  be  performed.  Returning  to  Figure  4,  the 
dashed  line  might  correspond  to  the  robust  redundancy  curve  for  an 
alternate  sensor  set.  This  set  has  a  higher  level  of  robust  redundancy 
than  the  one  corresponding  to  the  solid  line,  since  the  dashed  curve 
lies  below  the  solid  one.  Clearly  this  is  not  a  sufficient  reason  to 
state  that  the  alternate  sensor  set  is  superior  to  the  original  one  — 
e.g.  if  the  alternate  set  was  obtained  by  adding  several  sensors  to  the 
original  set,  one  would  have  to  check  that  there  is  enough  additional 
redundancy  to  permit  the  detection  of  the  larger  set  of  possible 
failures  associated  with  this  expanded  sensor  set  —  but  it  does  provide 
useful  information  for  this  design  process. 

Finally,  we  note  that  throughout  the  paper  we  have  assumed  a  fixed 
order  s  for  the  parity  checks  under  consideration.  In  any  application 
one  would,  of  course,  want  to  consider  several  values  of  s.  There  are 
clear  advantages  (in  terms  of  response  time,  and  complexity  of 
inplementation)  in  considering  small  values  of  s,  but  the  dynamics  of  a 
system  may  be  such  that  there  are  important  relationships  of 
particularly  high  order.  What  one  can  imagine  doing  is  solving  the 
robust  redundancy  problem  for  s  ■  1,2,....  Each  such  problem  would 
result  in  a  curve  as  in  Figure  4,  with  the  curve  for  each  successive 
value  of  s  lying  below  the  preceding  one.  While  this  would  appear  to 
indicate  that  larger  values  of  s  always  produce  additional,  useful 
parity  checks,  this  is  not  necessarily  the  case  —  one  must  check  to  see 
if  these  additional  redundancy  relations  are  truly  useful  or  are  simply 
nonminima  1  realizations  of  lower-order  parity  checks.  For  example,  if 
y2(k)  ■  ay^(k),  then  y£(k)  -  ay^(k)  is  a  valid  parity  check,  but  so  is 
y2(k)  +  y2(k-l)  -  ayj(k)  -  ay^(k-l).  See  [3]  for  a  polynomial  matrix 
characterization  of  a  complete  set  of  minimal-order  parity  checks  for 
deterministic  linear  systems  and  for  a  numerical  example  illustrating 
the  issues  raised  in  this  section. 


5.2  Alternate  Robustness  Criteria 


In  [2],  Chow  and  Willsky  consider  a  somewhat  different  formulation 
of  the  robust  parity  check  problem.  The  criterion  in  [2]  has  several 
significant  differences  from  the  one  we  have  used  here,  and  in  this 
section  we  describe  the  relationship  between  these.  In  the  process  we 
provide  additional  motivation  for  the  present  formulation.  We  also 
indicate  several  other  criteria  that  in  a  sense  represent  intermediate 
steps  between  [2]  and  the  present  paper,  and  that  provide  some  useful 
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insights.  A  more  thorough  development  of  these  can  be  found  in  [3] 


The  model  considered  in  [2]  is  a  modified  version  of  (27),  (28) 
that  includes  known  inputs  and  noise,  and  in  which  the  model 
uncertainties  are  not  constrained  to  a  finite  set  of  values.  As 
discussed  in  Section  4.2  and  the  Appendix,  there  are  direct  ways  in 
which  one  can  incorporate  known  inputs  and  continuous  parameter 
variations  into  the  present  formulation.  The  critical  difference 
between  [2]  and  oux  approach  is  the  specific  criterion  chosen  to  define 
robustness.  In  particular,  the  principal  problem  posed  and  solved  in 
[2]  is  the  determination  of  the  single  best  parity  check  r(k)  (so  p*l), 
where  "best"  is  defined  as  that  with  the  minimum  worst-case  mean-squared 
value  over  the  specified  range  of  parameter  uncertainties,  with  the 
system  at  a  specified  operating  point  --  i.e.  the  known  input  is  assumed 
to  take  on  a  specified  constant  value,  and  the  state  x(k-s)  at  the  start 
of  the  data  window  is  assumed  to  be  at  the  equilibrium  state 
corresponding  to  the  constant  control.  While  the  consideration  of 
operation  at  a  particular  set  point  does  allow  one  to  consider  adapting 
parity  checks  to  changing  operating  conditions,  this  flexibility  is 
achieved  at  the  expense  of  requiring  that  one  solve  a  complex  nonlinear 
optimization  problem.  Moreover,  if  one  wishes  to  consider  finding 
several  parity  checks,  one  must  either  solve  one  nonlinear  optimization 
problem  of  greater  complexity  or  a  sequence  of  problems  of  equal 
complexity  for  each  additional  parity  check. 

As  discussed  in  [3],  if  one  removes  the  operating  point  constraint 
of  [2]  and  assumes  instead  that  the  initial  state  is  completely 
unconstrained,  one  is  led  to  a  criterion  in  which  a  parity  space  P  has 
to  be  chosen  to  maximize  either  the  minimum  or  average  angle  P  makes 
with  the  observation  space  Z^  as  q  ranges  over  its  full  set  of  values. 
Here  the  cosine  of  the  angle  between  two  subspaces  is  defined  as  the 
maximum  length  of  the  projection  of  a  unit  vector  from  one  'pace  onto 
the  other.  While  for  any  two  subspaces  this  angle  can  be  calculated 
using  singular  values  [3],  the  maximization  of  the  average  or  worst-case 
value  of  this  angle  is  still  a  very  complex  nonlinear  optimization 
problem.  However,  on  reversing  the  steps  of  computing  angles  and 
averaging  over  parameter  uncertainties,  we  are  led  to  first  compute  a 
subspace  that  is  the  average  of  the  Z^  and  then  choose  P  to  be 
orthogonal  to  this  average.  This  is  very  nearly  the  criterion  we 
introduced  in  Section  3. 

Specifically,  as  shown  in  [3]  and  [16],  in  this  case  we  again 
choose  the  matrix  ZQ  to  minimize  (17),  but  now  with  the  columns  of  the 
matrices  Z^  chosen  to  form  orthonormal  bases  for  the  Z^.  The  only 
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sensor  noise  in  yj^-!)*  The  model 
any  of  the  many  existing  sophistieat 


riterion  ve  have  used  is  the 
—  i.e.  instead  of  viewing  the 
ied,  we  specify  its  covariance, 
interpretation  of  maximizing  an 
ice  orthonormal  bases  for  the  Z^ 
(efined  in  (15)),  but  the  use  of 
tain  a  practically  meaningful 


tv  Checks 

check,  the  question  arises  as  to 
i  and  Will sky  provide  a  detailed 
I  we  shall  not  repeat  it  here, 
f  comments  in  order  to  point  to 


on  which  we  have  focused  in  this 
:tation  is  averaged  over  model 
1  conditions.  This  criterion  is 
i  of  an  open-loop  [2]  failure 
r(k)  calculated  over  an  interval 
Lsions  (e.g.  by  comparing  the  suit 
tie  interval  to  a  threshold). 

ty  check  to  define  a  closed- loop 
fically,  as  mentioned  in  Section 
i  defining  a  dynamic  model.  For 


(62) 

»  between  the  change  in  measured 
tion,  j2>  scaled  by  the  sampling 
lodel  of  the  form 


(63) 

(64) 

ree  value  of  yj,  and  the  process 
iriations  of  r(k)  from  zero  under 
sling  error)  and  the  presence  of 


For  example,  one  could  considei 
on  the  innovations  Y(k)  from  a  Kali 
natural  measure  of  robustness  in  thi 
This  in  turn  raises  the  question  oi 
finding  P)  to  directly  minimize 
interesting  and  meaningful  criterion 
is  an  extremely  complicated  and  n 
methods  of  this  paper  would  not  diri 
remains  to  be  determined  (a)  if  an  e 
solving  this  problem  and  (b)  under  \ 
performance  improvements  can  be  o 
closed-loop  innovations. 

As  a  final  comment,  we  note  I 
checks  as  reduced-order  models  ra 
constructions  developed  here  prov 
reduction.  The  exploration  of  this  i 
we  note  one  interesting  point.  Spe 
such  as  (62)  specifies  is  a  constraii 
components  of  y(k).  If  one  wishes 
dynamic  model  for  the  evolution  of  < 
(64),  then  the  other  components  of  t 
to  this  model. 


*  It  is  interesting  to  note  that  a 
used  in  [5,13]  were  used  in  an  open- 
relation  was  used  to  design  a  i 
innovations  were  used  to  detect  alti 
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Consider  the  problem  of  choosing  an  top  matrix  P  to  minimize 


J-  I  II  PrZq  |||  (A.l) 

q-1 

subject  to  the  constraint  that  PTP  “  I.  Rote  first  that 

J  -  II  PTZ  llj  -  tr(PTZZTP)  (A. 2) 

where  Z  is  defined  in  (16).  As  discussed  in  Section  3,  we  assume 
without  loss  of  generality  that  H<Qn.  Let  the  singular  value 
decomposition  of  Z  be  as  given  in  (18),  (19). 

We  now  show  that  the  minimum  value  of  J  is 

I 

P 

J  -  I  of  (A. 3) 

i-1 


and  the  optimum  choice  of  P  is 


P  -  [u^:u2:  ...  :upJ  (A.4) 

where  the  u^  are  the  first  p  left  singular  vectors  of  Z.  To  do  this,  we 
use  the  following  elementary  result,  which  is  a  direct  consequence  of 
the  Courant-Fischer  minimax  principle  [3,  14]:  Suppose  that 


A 


A 

A 


11 

a12 

21 

A22 

(A. 5) 


is  nxn,  synmetric,  and  positive  semidefinite.  Suppose  also  that  Ajj  is 
mxm,  and  let  X^(A),  A^(Ajj)  denote  the  i-th  smallest  eigenvalue  of  A, 
Ajj  respectively.  Then 


Aj/A)  <,A£(Ajj)  ,  i  ■  l,...,m  .  (A. 6) 

T 

Consider  then  any  choice  of  P  satisfying  the  constraint  P  P  ■  I, 
and  augment  this  matrix  with  N-p  additional  columns  so  that  the  square 
matrix 


F  -  [P:D] 


(A.7) 


is  orthogonal.  Than 


PTZZTF  - 


PTZZTP  * 
★  * 


(A. 8) 


Applying  (A. 6)  to  (A.8)  and  using  both  (A.2)  and  the  fact  that  F  is 
orthogonal,  ve  see  that 


P  P  P 

I  O?  -  I  At(ZZT)  -  £  A^(FTZZTF )  <  tr(PTZZTP)  -  II  PTZ||  I  (A. 9) 
i-1  i-1  i-1 


From  (18)  we  see  that 

ZZT  -  0I21T0T  (A.  10) 


with 


-  diagonal  [oj,  ...  *0^]  .  (A.ll) 

From  this  we  see  that  the  inequality  in  (A. 9)  becomes  an  equality  if  p 
is  chosen  as  in  (A.4),  thereby  proving  our  assertion. 

He  note  that  from  this  analysis  we  can  directly  deduce  that  the 
same  choice  of  p  minimizes  a  variety  of  other  criteria.  For  example,  an 
interesting  one  is 

det(PTZZTP)  (A.12) 

which  has  the  interpretation  of  minimizing  the  volume  of  the  projection 
of  the  columns  of  Z  onto  the  subspace  P.  The  proof  that  the  same  P 
minimizes  (A.12)  is  also  a  straightforward  consequence  of  (A. 6)  and 
(A.8).  Specifically 


P  P  P 

det(PTzzTp)  -  TT  Ai(PTzzTp)  >  TT  ^(zz1)  -  TT  o?  (a. 13) 

i-1  i-1  i-1 


with  equality  resulting  once  again  if  P  is  taken  as  in  (A.4). 

Finally,  note  that  (as  can  be  seen  in  (A.10))  ve  are  actually  using 
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Che  eigenvalue-eigenvector  decomposition  of 


Q 

ZZT  -  z  z.zj 
q-1 

in  order  to  find  the  optimal  choice  of  P.  This  suggests  a  direct 
generalization  of  the  criterion  (A.l)  to  allow  continuous  parameter 
variations.  Specifically,  assume  that  q  c  K,  a  compact  subset  of  a 
finite-dimensional  Euclidean  space,  and  consider  the  following 
criterion: 

J  - XllPTZq  llpdq  -  tr{PT(/  ZqZqdq)P)  (A. 14) 

K  K 

(As  before,  this  can  be  interpreted  as  E[  ilr(k)  II2],  where  we  have 
absorbed  the  square  root  of  the  probability  density  of  q  into  the 
definition  of  Zq). 

Consider  the  eigenvalue-eigenvector  representation 

/ ZqZqdq  -  UATU  (A. 15) 

K 

where  0  £  Aj  <.Aj  <,  Ajj.  Then  the  first  p  columns  of  D  define  the 

optimal  choice  of  P.  Note  also  that  (assuming  that  A}  >  0)  if  we  define 

Vq  -  A“l'Vzq  (A. 16) 

then 

Zq  -  0A1/2Vq  (A. 17) 

where  O^U  “  I  and 

/Vqdq  *  1  *  (A*18) 

K 

Hence  (A.17)  is  the  singular  value  decomposition  of  the  map  Z  . 
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